Aeroacoustic post - processing with MapReduce

نویسنده

J. W. Nichols

چکیده

Present day large-scale computational fluid dynamics simulations can easily produce tens, if not hundreds, of terabytes of useful data. While computational capacity continues to increase according to Moore’s law, the speed of input-output (I/O) to data storage systems has not increased at the same rate. This means that the gap between processing speed and bandwidth to storage systems is increasing exponentially. This trend is in part fueled by the fact that supercomputer power is most often measured in floating point operations per second (FLOPS), while other metrics receive less attention. If the gap between processing and storage speed continues to grow, it will drive scientific data processing towards an in-situ paradigm in which very little data are ever stored. Instead, “post-processing” routines will need to be performed “on-the-fly,” in tandem with the simulations that they analyze. If a different data analysis is desired in the future, it will require re-running the simulation. Note that this paradigm may be at odds with usual scientific procedure where data are collected and then analyzed in a progressive fashion. Understanding one aspect of the data naturally leads to a host of additional questions. In other words, scientific data sets often contain unexpected effects that cannot be predicted ahead of time. The purpose of this brief, therefore, is to explore new technologies enabling fast access to large quantities of stored data as an alternative to the in-situ paradigm. In particular, we look to data-access techniques developed for web search engines like Google, which must constantly query enormous databases pertaining to the state of the Internet. Such databases are too large to be stored on any one disk – instead they reside on thousands or tens of thousands of disks. The solution to fast access lies in expressing a query in a special format known as “MapReduce,” which is in itself a programming paradigm. If a postprocessing task is expressible in this fashion, MapReduce enables the code implementing the task (map phase) to be sent directly to the data residing on distributed disks, rather than requiring the data to be sent to the code. Only a small amount of data representing the desired result is communicated at the end (reduce phase). In this way, effective I/O throughput may be dramatically increased.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the aeroacoustic properties of a beveled plate

The flow around a beveled flat plate model with an asymmetric 25 degrees trailing edge with three rounding radii is analyzed using a Navier-Stokes based open source software package OpenFOAM in order to predict the aeroacoustic properties of the models. A Large Eddy Simulation with a dynamic Smagorinsky and implicit model are used as closure model for the flow solver, and are compared regarding...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A New Parallelization Method for K-means

K-means is a popular clustering method used in data mining area. To work with large datasets, researchers propose PKMeans, which is a parallel k-means on MapReduce [3]. However, the existing k-means parallelization methods including PKMeans have many limitations. It can’t finish all its iterations in one MapReduce job, so it has to repeat cascading MapReduce jobs in a loop until convergence. On...

متن کامل

Beamforming of aeroacoustic sources in the time domain

A classical array processing technique used for the analysis of aeroacoustic sources is the frequency-domain beamforming technique. The use of this technique requires an assumption on the stationarity of the sources as it works with a time-averaged estimate of the cross-spectral matrix. As a consequence this technique provides an estimation of the average position (in space and time) of an aero...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Aeroacoustic post - processing with MapReduce

نویسنده

چکیده

منابع مشابه

On the aeroacoustic properties of a beveled plate

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

A New Parallelization Method for K-means

Beamforming of aeroacoustic sources in the time domain

عنوان ژورنال:

اشتراک گذاری